DOCUMENT RESUME 



ED 223 686 



TM 820 830 



AUTHOR 
TITLE 3 

INSTITUTION 

SPONS AGENCY 

REPORT NO 
PUB DATE 
CONTRACT 
NOTE 

.PUB TYPE 

EDRS PRICE 
DESCRIPTORS 

IDENTIFIERS 



Tatsuoka, Kikumi K,; Tatsuoka, Maurice M. 
Standardized Extended Caution Indices and Comparisons 
of Their Rule Detection Rates. 

Illinois Univ., Urbana. Computer-Based Education 
Research Lab, 

Office of Naval Research, Arlington, Va. Personnel 
and Training Research Programs Office. 
CERL-RR-82-4-ONR 
Mar 82 

N000-14-79-C-0752 
64p. 

Reports - Research/Techn i^^l (143) 
MF01/PC03 Plu3 Postage. 

*Error Patterns; Higher Education; *Latenl Trait 
Theory; Models; Responses; *Standardized Tests; 
Statistics; Test Interpretation; *Test Theory 
Caution Index (Sato) ; *Caution Indices ; 
Unidimensional Scaling; Variance (Statistical) 



ABSTRACT 

Several extended caution indices (ECIs) have been 
introduced earlier as a link between two distinctly different 
approaches: one based on standard statistics and the other, a 
model-based approach, utilizing item response theory (IRT). Expected 
values and variance of some ECIs are derived and their statistical 
properties are compared and \discussed. Then, standardized ECIs are 
introduced and their distributions are investigated. It turns out 
that the standardized ECIs fit normal distributions well. A 
comparison of d^^ctjLon rates among appropriateness measures based on 
IRT theory is carried out with the sign'ed-number data set. There is 
no noticeable difference in their detection rates using the 80 
jpercent intervals. (Author) 



************************************************************^ 

* Reproductions supplied by EDRS are the. best that can be made * 

* from the original docixment. * 
*****************************^******************************** 



ERIC 



6 




Computer-based Education 



Research L^aboratory' 




/lTlZlZlTlZZZlZ/ ^ 

fLL LLLLLi,Li.Li,kL / / 
i,t 1. <.<.<. 1.4. 1.4. / >r 



University of Illinois 



Urbana lllmijiis 



STANDARDDZED EXTENDED CAUT90N 

AND 

COMPARDSONS OF THEBR 
DETE^CTOON RATES 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



KIKUMI K.TATSUOKA 
MAURICE M.TATSUOKA 



U.S. DEPARTMENT OF EDUCATION 

NATIONAL INSTITUTE OF EDUCATION 
EDUCATIONAL RESOURCES INFORMATION 

CENTER (ErIc) 
X This documtnt has b«5en reproduced as 

received from the person or organiiation 

originating it. 

Minor changts have boen made to improve 
reproduction quality, 

• Points of viQw Of opinions stated in this docu- 
ment do not necessarily represent official NIE 
position or policy 



Approved for public release; distribution unlimited. 
Reproduction in whole or in part permitted for any 
purpose of the United States Government. 



This research was sponsored jjy the Personnel and Training 
Research Program, Psychological Sciences Division, Office 
of Naval Research, under Contract No. N000-14-79-C70752. 
Contract Authority -Identification Number NR 150-415. 



COMPUTERIZED ADAPTIVE TESTING AND MEASUREMENT RESEARCH REPORT 82-4-ONR 



MARCH i98C 



Copies of this report 
may be requested from: 



Kikumi K. Tatsuoka 
252 ERL 

103^ S. Mathews 
University of Illinois 
Urbana, IL 6l8Ul 



Unclassified 



<5ECUi*ilTY CLASSIFiCATlON OF THIS PAGE (When Dmf Entered) 



REPORT DOCUMENTATrON PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


1. REPORT NUMBER 2, GOVT ACCESSION NO. 

Research Report 82-4-ONR 

1 


3' RECIPIENT'S CATALOG NUMBER 


4. TITLE (mnd Subtitle) 

Standardized Extended Caution Indices and 
Comparisons of their Rule Detection Rates 


5, TYPE OF REPORT ft PERIOD COVERED 


6, PERFORMING ORG. REPORT NUMBER 


7. AUTHORr*; 

Kikumi K. Tatsuoka & Maurice M. Tatsuoka 


8. CONTRACT OR GRANT NUMBERC*; 

N00014-79-C-0752 


9. PERFORMING ORGANIZATION NAME AND ADDRESS 

Computer-based Education Research Laboratory 
103 S. Mathews, 252 ERL, U of Illinois 
Urbana, IL 6I80I 


10. PROGRAM ELEMENT, PROJECT, TASK 
AREA ft WORK UNIT NUMBERS 

61153N; RR 042-04 


n. CONTROLLING OFFJCE NAME AND ADDRESS 

Personnel and Training Research Programs 
Office of Naval Research (Code 442) 
Arlington, VA 22217 


12. REPORT DATE 

March 1982 


13. NUMBER OF PAGES 


U. MONITORING AGENCY NAME & ADDRESS^/ different from Controttlng Office) 


IS. SECURITY CLASS, (of thin rnport) 


15*. DECLASSIFI CATION/ DOWN GRADING 
SCHEDULE 


16. DISTRIBUTION ST ATEMEN T fo/ //»f • Reporf; 

Approved for public release; distribution unlimited 



17. DISTRIBUTION STATEMENT (oi thm mbalrmct •nferod In Block 20, If dUfmrenl from Rnport) 



18. SUPPLEMENTARY NOTES 



19. KEY WORDS (Contlnua on tevmrme aide If nmcmBitiy and idmntlfy by block numbmr) 

expected values, extended caution index, variances, item response theory, 
standardized appropriateness measures, detection rate of aberrant 
response patterns, signed-number subtraction. 



20. ABSTRACT (Contlnum on rmvermm mld» If nmcmmmmry mnd Idmntlfy by block numbmr) 

Several extended caution indices (ECIs) have been introduced 
earlier as a link between two distinctly different approaches: One 
based on the standard statistics and the other, a model-based approach 
utilizing item response theory (IRT). Expected values and variances 
of some ECIs are derived and their statistical properties are compared 



DO 



ERIC 



FORM 
1 JAN 73 



1473 EDITION OF 1 NOV 65 IS OBSOLETE 
S/N 0102.LF-014.6601 



SKCURITY CLASSIFICATION OF THIS PAGE (Whmn Dmtm Bntmrmd) 

4 



HnrlasHtf led 

SECURITY CLASSIFICATION OF THIS PAGE (When Dmtm Enftod) 



and discussed. Then, standardized ECIs are introduced and their 
distributions are investigated. It turns out that the standardized 
ECIs fit normal distributions well. A comparison of detection rates 
among appropriateness measures based on IRT theory is carried out with 
the signed-number dataset. There is no noticeable difference in their 
detection rates using the 80% intervals. 



(3 SeCUBITV CUASSIFICATION OF THIS PAGEfWh.n Dmtm Bntmrmd) 



ERIC 



Ackaowledgement 

--This research was sponsored by the Personnel and Training Research 
Prog-*-am, Psychological Sciences Division, Office of Naval Research, 
under contract No. N00014-79-C-0752. 



Several of the analyses presented in this report were performed on 
the PLATcl^ system. The PLATO® system is a development of the University 
of Illinois, and PLATO^ is a service mark of Control Data Corporation. 



The authors gratefully acknowledge the painstaking and creative computer 
progrannning carried out by Robert Baillie on both the PLATO and CYBER 175 
systems. Delwyn Harnisch provided us with estimated parameter values for 
the NAEP data. Gerard Chevalaz plotted the graphs of the standard errors 
of ECIl-4 in Appendices VII through X, using the SPSS package. Louise 
Brodie did the painstaking typing of the manuscript, replete with equations, 
on the PLATO system. We are indebted to Roy Lipschutz for the artwork. 



Abstract 

Several extended caution indices (ECIs) have been introduced 
earlier as a link between two distinctly different approaches: one 
based on standard statistics and the other, a model-based approach 
utilizing item response theory (IRT). Expected values and variances of 
some ECIs are derived and their statistical properties are compared and 
discussed. Then, standardized ECIs are introduced and their 
distributions are investigated. It turns out that the standardized ECl 
fit normal distributions well. A comparison of detection rates amon^ 
appropriateness measures based on IRT theory is carried out witti the 
signed-number dataset. There is no noticeable difference in their 
detection rates using the 80% intervals • 
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Introduction 

An increasing number of researchers have begun to show interest in 
using response patterns of n items for analyzing performance on test 
scores. By so doing, more information is obtainable than by using only 
traditional total scores. Tatsuoka and her colleagues (liireiibaum 6 
Tatsuoka, 1982a, b; Tatsuoka & Tatsuoka, 19b2a) iiave demonstraued that ^ 
some wrong rules of arithmetic computation's (fractions and si^^ned- 
numbers) can produce the right score of 1 on as much as bU% of the test 
items. If many students apply a variety of wrong rules consis*" ntly 
throughout the test, then these faulty rules cause a serious problem by 
violating the unidimensionality assumption of a dataset. After 
rescoring these correct responses obtained by faulty rules, the dataset 

became nearly unidimenslonal . They have developed several indicep to 

J 

detect aberrant response patterns resulting from consistent application 
of wrong rules (Tatsuoka hi Tatsuoka, 1982b) and have shown one of them, 
the individual consistency index (ICI) , to spot more than 90% of such 
aberrant response patterns (Tatsuoka 6* Tatsuoka, 1981). 

Rudner (1982) investigated the detection rates of various personal 
indices (norm conformity index, caution index, personal biserial and 
appropriatness measures based on item response theory) and found that 
the indices based on IRT are more efficient for detecting anomalous 
response patterns than those based on observed item response and summary 
statistics. However, estimating parameters of IRT models requires a 
substantial number of subjects while it is often impossible to have such 
a large sample size in many classroom settings. 
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Sato (1975) developed the caution index in conjunction with S-P 
curve theory and succesfully used it for diagnosing students' 
performance and evaluating instructional materials in Japan. Haraisch 
and Linn (1981) demonstrated its usefulness by applying it to a NAEP 
dataset (National Assessment of Educational Progress). Although their 
analysis is based on a large dataset, their results show clearly that 
analysis of response patterus as a whole provides very useful information 
associated with individual differences, curriculum differences and 
school differences. 

The concepts of S-P curve theory and caution index have been 
extended to the continuous domain of IRT models from the approach based 
on the discrete^ summary statistics by Tatsuoka and Linn (1982). They 
have developed five alternative indices and named them extended 
caution indices 1,2, 3, 4 and 5. In this paper, further statistical 
properties of ECU, 2, and 4 will be discussed and their detection rates 
will be compared. 

Statistical Properties of Extended Caution Indices 
Definitio n of the Extended Caution Indices 

A group of extended caution indices (ECI) has been introduced as a 
linkjbetween two distinct approaches of detecting aberrant response 
patterns (Tatsuoka 6l Linn, 1981) . One is based on the use of binary 
response patterns and their standard summary statistics (Sato, 1975; 
van der Flier, 1977; Tatsuoka 6t Tatsuoka, 1980, 1982a), while the other is 
a model-based approach. In the latter, the patterns of probabilities 
that are derived from item response theory are utilized in calculating 
appropriateness measures together, with observed binary response patterns 
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(Wright, 1977; Drasgow, 1978; Levine & Rubin, 1979). ECIs are ai 
* extension of Sato's caution index to the approach usl^ IRT. In this 
section, three of the five ECIs will be investigated in terms of their 
expected values, variances, and advantages and disadvantages. 

Let yij [i=«l,...,N; j=»l,...,n] be the binary score of subject i to 
item j, y±, be the ith row sum, and y.j the jth column sum of the data 
matrix (yij)* Let Pij be the probability of^ subject i answering item j 
correctly, yhich may be based on the one-, two- or three-parameter 
logistic model. That is, 

Pi-j = Cj + 



1 + exp[-Daj (Bi - bj)] 

< 

where cj = 0 and aj = 1 for the one-parameter logistic model; cj = 0 for 
the two-parameter logistic model. Thus, two data matrices — one 
comprising observed binary scores of n items for N subjects' (yij) and 
the other consisting of (Pij) — nay be introduced. We refer to (yij) 
as the observed binary matrix and (Pij) as the probability matrix. 

Let Gj be the jth' element of a vector approximating the group 
response curve (GRC) for item j, and Ti be that of the vector for .the 
test response curve (TRC) for subject i. Then 

In other words, Gj for item j and Ti for subject i are the jth column 
sum and the ith row sum, ^respectively , of the probability matrix (Pij). 

Three of the five ECIs are defined as complements of the ratio of 
two covariances between var^.ous pairs of row vectors taken from 
the two matrices. 
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ECIli 



=■ 1 



cov (Pi ,^.) 



(1) 



ECl4i = 1 



ECI2i = 1 



"vCG ,^i) 



(3) 



(2) 



where = (yi^, 



» • • • > 



yin) > vector of binary scores for subject i 



or the ith row vector, 

~ (y.l> y.2> • • • >y.n) > column-sum vector in the oDserved 

binary matrix, 

^± ° (Pil> Pi2> • • • >Pin) > the probability vector from the ith row 
in the probability matrix, and 

(G^, G2»»»,Gn), the GRC vector which is the column-sum vector of 
(Pij). Expression (1) is defined by forming the ratio of the following 
covariances: the numerator is the covariance of subject i's response 
pattern and the column-suin vector over n items in (yij), and the 
denominator is the covariance of the ith row probability vector derived 
from a logistic model and the column-siim vector in (yij). Expressions 
(2) and C^) have the same denominator, the covariance of tne GRC vector 
and the ith probability vector, and the numerators are covariances of 
the response pattern vector with the GRC vector and the probaDility 
vector , respectively • 

When^i consists of all Is or Os , the second terms of the ECls 
become uildeteirmiaed . 
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The expectations of ECU, ECI2 and ECI4 

<^ In this section, the expectations and variances of the three ECIs 

given by Equations (1), (2) and (3) will be derived. The actual 

values of the ECIs for subject i can be calculated by replacing the item 

and persdon parameters with their estimated values a j , bj and B± based on 

the maximum likelihood method. It is known that the maximum likelihood 

estimates of item and person parameters satisfy the likelinood 

conditions (Lord and Novick, 1968) given in Equations (4). 

6 

n n 

Since the ECIs are functions of the person parameter 6^, the conditional 
expected values and variances of the ECIs for a fixed ability level will 

A 

be introduced. Hereafter, the circumflex on P^j (and its ith-row vector 

Pi) will be omitted to simplify the notation. 

ECU 

The conditlt>,^al^expectation of the first ECI defined in Equation 

(I) is given by the following: 

E(ECiilei) 1 - E (-^'^-^-^^ qA 

\cov^i . T.) / 



E[cov^ »,^.lQl)] 
1 - cov^Pi ,^.) 



The observed vector is a random vector at the level b± and the 

expectation is obtained over k. Now, we have to find the expectation in 

the numerator of the second fraction, E[cov(yt^ , y.)\^±]» First, 

the covariance of and is rewritten as the summation of the product of the 

deviations : 



i 



E[cov(yk . y±)\^±] = E[_.2^(ykj - Pi.Xy.j - P.JlBiJ / n 



where pi . is the 1 :h row mean of (yij) and p. ^ is the mean of the row uieaus or 
column means as follows, 

, n ^ N 

^* • n j=»l N i=l 

By using the second .members of Equations (4), this expectation 

reduces to the covariance of P-; and y. • Thus, the conditional 

expectation of ECU at the fixed level i becomes zero, as summarized in 

Equation (6). 

E(ECiilei) - 1 "'^{v^r^ ^ 0 . (6) 

© 

The conditional variance of ECU at the fixed level i is 

Var(ECIllei) = E[ECI1 - E(ECIl|ei)]2 . (7) 
By substituting the result from (6), the conditional variance 
(7) becomes E(ECIl2|ei). That is: 

E(ECIl2^ei) = E([l - ^°vQLk '^•)]2| ) 



where we have again used the fact that E[cov^i^ )J " cov t Y.) ^ 

The numerator of the last term of Equation (8), however, can be expanded 



(8) 
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to^ the sum of the diagonal and off-diagonal terms, and then by applying 
the conditions given ift Equations (4), we obtain Equation (9). 

^ E([J^(ykj - Pi.)(y.j - P..)]^|Q.) 

+ -T ^^iu ^^kJ " Pi.^^ykh - PiJCy.j - P--^(y-h - P.JlQi^l * 
jrh 

The first term, the diagonal part inside the parentheses of the above 

equation, is: 
n 
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E[_j2^(ykj - Pi.)2(y.j - p.J^lei] 



2 (y.j - P..)2 E[(ykj - Pi.)2|6i] 



= ^2^(y.j - P..)2[Pij(l - Pij) + (Pij - Ti)2] 
The second term inside the parenthesis is: 



j2^(y.j - p.JCy.h - p..) E[(ykj - Pijlei] £[(yi^j^ - Pi.)|9i] 



Adding the -results of the two expectations gives Equation (10). 

In - 
n2 ^(tj^^^ykj - Pi.)(y.j - P..)]2|9i) 

- cov2(y. ill^^-J " P'-^^^'i/ ^ 



ft 

. 8 
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Substituting (10) in Equation (8), the variance of ECU becomes: 

cov2(y. , Pi) + 2 Oi-j^ (y . - P ) /n 
Var(ECIl) = -1 + 



cov2(Pi , y.) 



2a?j(y.j - p..)2 (11) 

j = l 

n2cov2(Pi , y.) 



ECI2 

The conditional expectation of the second ECI is given by 



E(ECl2l0i) = 1 - E 



But 



E(coi(yi, ,0)101 = E[ 2 (ykj - Pi.)(G - T) | 0 ] 



y^E[(y,j - pi.)(Gj - T)|. e^]_^^ 



^J^(Pij - Ti)(Gj - T) -;cov^ .^) . 
N 



where 



T - 2 Tf/N « 2 Gi/n 

/ 

By substituting this result in Equation (12), we get (13). 



(12) 



E(Eci2lei) • 1 - ""tfi s 0 : (13) 

lb 
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The conditional variance of ECI2 is given by Equation (14), 

Var(ECI2|ei) « E[(ECI2 - E(ECI2))]2 le^) 
= E(ECI22 lei) 



-1 + E[co v2(yj, , G)\^±] 
cov^(G , Pi) 



Var(ECI2|ei) = 



(14) 



The expectation of the squared covariance of yj^ aad^ can be simplified 
and given by Equation (15). 

E[cov2(^k 1^)1^1] cov2^i ^ G) ^J^a^,^ (G, - T)^ . (15) 

By substituting (15) in (14), we get (lb). 

2 (G - T)2aij^ 
J=l ^ 



n^cov^i;^ 
ECI4 

The conditional expectation of ECI4 is 

cov(yic , Pi) IBi 

where ^ is a random variable from the distribution of binary responses 

to n items at the fixed ability level i. Since the denominator of the expected 

value, cov C^,^i), is fixed at level i, the second term will be 

simply the expectation of the numerator divided by the covariance of G 

and^, E[cov(yk, ^)|ei]/cov (;g^ , ^i). 

E[cov(yk ,^i)|ei] 
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But E(yiej - PijBi) Pij " because of Equations (4) 
Therefore, 

E(ECl4|ei) =|1 -l^^J^ 



1 - 



Var^) 



cov^.^) 



Var(ECl4|ei> = E 



A straightforward expansion of the inside of the parentheses leads to 
Equation (20). 

Var(ECI4lei). .y^^lHSL-liilfil . '"f,^ "^^ 
coV^C^ , Pi) coV^ , li) 

The numerator of the first term, E[cov2(y,^ ,^i)|ei], can be simplified 
in the same manner as in the case of ECU. 
E[cov2(yk .^i)l®i] 
= E([ J^(ykj - Pi.)(Pij - Ti)]2 I e.) 

- -2 ^tji^^i^-i ' Pi.^^^^iJ - Ti )^ I e^) 

ERIC 



(18) 



The conditional variance of ECI4 is given by Equations (19). 

Var(ECl4|ei) " E [[ECI4 - E(ECI4) ] 2 1 0^] (19) 
Substituting the expectation of ECI4 from Equation (18), (19) becomes 



(20) 
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Becausq of local independence and Equation (^) , we obtain the following 

two relations: 

n 



E[ 



and 



\^ 

^[jJh^^^J " Pi-^^yi^h - Pi.)(Pij - Ti)(Pih - ^1)19.1 

=- 2 [(Pij - Ti)2(Pih - Ti)2| 9 J 
By adding the results, we obtain 

E[cov2(^l^ .^)lei) ' 

= JJ,[(PiJ-T,)2j2._^ j^a^.V^. -Tp2 

= Var2(Pij) + ^ J^a^j (Pij - T.)2 . (21) 

By substituting (21) in (20), we get Equation (22), the variance of hCl4 . 

cov2XPi , Pi) +4 2 a ^P - T )2 
Var (ECU lei) = r ^ ^3 = 1 ' _ cov'^i .Pj) ' 



\ cov^(^G^Pi) cov2^G^) 



(22) 
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Comparison of Some Statistical Properties of the Three Indices 

ECU, ECI2, and ECI4 
Comparison of the Standard Errors 

The conditional expectations of the three indices are different in 
a manner that suggests that ECU and ECI2 are similar to each other, 
while ECI4 stands alone. ECU and ECI2 have the constant expectation 
zero, regardless of the level of person parameter e±. On the other hand, 
the expectation of ECI4 is a function of Si, as shown in Figure 1 for 
the dataset obtained from a 32-item signed-nuraber subtraction test. The 



Inse r t Fig ure 1 a bout here 
X-axis represents true scores and the y-axis the 127 students" expected 
ECI4 values. The curve in Figure 1 decreases monotonically as the true 
score decreases. The standard error of ECI4 is the square root of 
expression (22) and is also a function of 6. Figure 2 shows the 
relationship between the standard error and the true scores. (Tne 
estimated true score of IRT was used instead of e± so as to have a value 
between 0 and 1, which facilitates comparison across different tests.) 

Insert Figure 2 about h ere 
For students whose true scores are extremely high or low, the standard- 
error curve rises sharply, while for average scores, it becomes rather 
flat. 

Figures 3 and 4 are plots of the stancfard errors [square rc^s of 
expression (11) and (16)] of ECU and ECI2 against true score as the x- 
axis. They are almost identical curves that are nearly horizontal for 
the average true scores but increase rather rapidly at both the high and 
low extremes of true scores. 
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FIGURE 1 ; Expectation of ECI4 Plotted Against the True Score 
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FIGURE 2: The Standard Error of ECI4 Plotted Against the True Score 
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Insert Figures 3 & 4 about here 

ECU and ECI2 correlate highly (r = .97, see Appendix XI) and have 
the same constant expectation of zero. Moreover, their standard errors 
have almost identical curves when plotted against true scores*, so we 
will drop ECU hereafter and make comparisons between ECI2 and ECI4« 
Since ECI2 is defined by using the elements in the probability matrix 
(Pij), the investigation of ECI2 and ECI4 will be more interesting. 
Standardized Exten deid Cautio n Indices, ECI22 and ECl4z and their 
Density Functio ns 

ECIs can be standardized by subtracting their expected values and 



then dividing it by their standard errors. Equations (23) and (24) are 
the standardized extended caution indices ECI2 and ECI4» 



ECIZ^ = ECI2 - g(ECI2| 9i ) ^ acov^j -jj . ^) 

j = l 

= ECI4 - E(ECl4|9i) ^ ncov^j - , ^) 

SE(ECl4|ei) rn 21 V2 



As can be seen in Equations (23) and (24), the second variables of the 
covariances in the numerators are ^ and Pj^, respectively. The 
denominator fo rS: CI22 involves the group— oriented vector G — Tl while 
that for ECI42 involves the individual-oriented vector at the level i. 
Pi - Tfl. Tatsuoka and Linn (1982) argue that ECI4 may correspond to the 
individual consistency index (ICI) introduced in Tatsuoka U Tatsuoka 
(1980, 198^) while ECI2 may function similarly to the group dependent 
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FIGURE 3: The Stondord Error of ECU Plotted Against the True Score 
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FIGURE 4: The Standard Error of ECI2 Plotted Against the True Score 
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indices, \i.e., Sato's caution index (1975) or the norm conformity index 
(Tatsuoka 5& Tatsuoka, 1980, 1982a). The ICI has proven to be effective 
in spotting \rhe aberrant response patterns resulting from consistent 
application o:^ erroneous rules of operation (Tatsuoka 6t Tatsuo^, 1981). 
Our pvediction\wlth regard to detection rates of erroneous rule's of 
operation is thkt\ ECI4 should be better than ECI2. 

It should be noted that the scale of the original ECIs are 
functions of 9 bu^ '\those of the standardized ECIgS no longer depend on 
0. As a result, tvyo ECl4z (or ECIZ^) values obtained from different e 
levels are comparable in terms of the extent of anomaly they signify. 
However, the density^ functions of ECI2z and ECl4z have to be 
investigated in ordeV to determine their differences statistically. 
Figures 5 and 6 show the goodness-of-f it test of the normal distribution 



^Insert Figures 5 & 6 about here 
for ECI22 and ECl4z. j Appendices I and TI give the tests of the normal 
distribution for ECIll and Iz (Levine & Drasgow's standardized 

appropriateness measure, 1982), while Appendices III, IV and V give the 

i 

goodness-of-f it tests jof beta distributions for ECIlz, ECI2z, and ECI4z. 
The data used in thes|e figures are based on 2,400 students' scores 
obtained from a math tiest (National Assessment of Educational Progess 
series, mathematics for 13 year olds. Booklet 4). As can be seen in the 
figures, both the standardized ECIs fit normal distributions well. 
Similar results are obtained from the NAEJP data. Booklet 5. 

Appendices VII, V^II, IX and X give the standard errors of £011^, 
EC;2z, and EClAg and th^ expectation of ECl4z, obtained from the NAEP 
data. Although the NAliP data is used for testing "goodness of fit" of 
the ECIs with tleoretical distributions, we will go back to the signed 

er|c ! 2o 
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number data in order to investigate the cl<itection rate of aberrant 
response patterns by ^le standardized ECIs. In the next section, a 
brief description of the dataset and procedure fur the couiparions will 
be described. 

A brief descriptio n of the dataset 

Birenbaum and Tatsuoka (1982a) have demonstrated that the 
traditional zero-one scoring of incorrect and correct answers does not 
reflect a student's performance correctly because several erroneous 
rules frequently yield the right answer for some problems. By extensive 
error analysis performed on the original dataset (the 127 eighth graders 
test scores for signed-number subtraction problems) Birenbaum and 
Tatsuoka (1980) identified erroneous rules that were consistently 
applied by certain students. They rescored ones to zeros for items that 
students got right for the wrong reasons. The dataset used in Figures 1 
through 4 are the modified dataset in which the scores of zero-one 
should reflect more accurately the student's performance than the 
original dataset of N =» 127. The modified dataset was much more nearly 
unidimensional and had higher item-item and iteiu-total correlations 
t^an the original, while the item-means and standard deviation remained 
almost the same (Birenbaum & Tatsuoka, 1982a). Fifteen erroneous rules 
were randomly selected from the 45 erroneous rules listed in Tatsuoka & 
Tatsuoka (1981) and responses based on these were added to the modified 
dataset. We refer to the new dataset of N =» 142 as "Bugdata" hereafter. 
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Comparis on of detection rates of ECll^ and ECl4z with respect to 
their 80% Interval s 

By using the Item parameters estimated from the modified dataset, 
ECI22 and ECl4z for the 142 subjects in tiie bui^dataset were calculated 
and plotted against the true scores. Figure 7 Is the scatterplot of 
ECI42 against the true scores and Figure 8 Is ECI2z against the same 
true scores. The 15 bugs are marked by a small circle "o" with the 
numbers and 89 real data points are marked by a plus sign without 
being numbered. 

Insert Figures 7 & 8 about here 
The 80% Intervals for both the ECIs and iz are constructed and 
listed in Table 1 along with the means and standard deviations of the 
indices. These are the Intervals within which, theoretically, the 
values of the indices associated with 80% of the non-aberrant responses 

In sert Table 1 about here ^'^^^^ 
should fall. The intervals are marked by broken lines in Figures 7 and 
8. We may choose, as a convenient decision rule, to classify response 
patterns with index values outside these intervals as ''aberrant. " The 
proportions of real response patterns classified as "aberrant" (which 
are essentially false alarm rates) by the four indices that are shown in 
Table 2 along with the proportions of. the 15 bugs that are detected* 

In s ert Table 2 about her e 

The unstandardlzed ECI4 seemed to have the best detection rates in 

r . 

comparison with the other four ECIs (Tatsuoka & Lina, 1982) but lost its 
high rate after it was standardized. Exactly the s^i^e dataset is used 
in both the cases, the standardized and unstandardlzed fourth extended 
caution index. In Table 2, the false alarm rates of the four indices 
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FIGURE 7! Plot of ECI4 z Against True Score for the Modified Dataset {"+") 

and Erroneous Rules ("o"), and 80% Probability Interval (-1.55,1,59). 
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FIGURE 8: Plot of ECI2 z Against True Score for ttie Modified Dotoset ("+") 
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Table 1 





The 


80% Intervals 


of ECU , 






ECI2^ , ECU^ 


and Iz. 


Indices 


Mean 


S.D. 


80% confidence interval 


ECIl^ 


.001 


1.105 


(-1,414, 1.416) 


£012^ 


.020 


1.230 


(-1.555, 1.594) 


EC 14 

2 


.019- 


1.229 


(-1.554, 1.593) 


Iz 


.017 


.619 


( -.775, .809) 
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Table 2 

Detection Rates of Erroneous Rules by Four 

Personal Indices Based on Item Response Theory 

with Bugdataset 

Real Students Erroneous Rules 
N = 89 N = 15 

ECU .22 .60 

z 

EC12^ .15 .53 

gCIA^ .17 .67 

Iz .18 .67 
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vary around 20% as they should, while the correct detection rate 
fluctuates around 60%. Considering the fact that the false alarm rate 
for the 89 students by using ICI with total scores (ICI > .90 and scores 
lower than a certain criterion, Tatsuoka & Tatsuoka, 1981) was less than 
5%, the results summarized in Table 2 are not as good as we had 
expected. One reason for the low detection rates may be the fact that 
the modification procedure of rescoring in the original dataset was 
carried out by an intuitive error analysis, and hence there are some 
responses affected by persistent misconceptions left in the modified 
dataset. Table 3 lists the percentage of "bugs" left in the modified 
dataset. The total number of bugs (including repetitions) has become 
42. The mean absolute value of ECIA^ in the two groups described in 
Table 3 are 3.141 for the bugs that were not found in the modified 
dataset, 1.353 for the bugs left In. However, the value of £014^, 
1*353, is still substantially high in comparison with the majority of 
real responses in the modified dataset. 



Insert Table 3 about here 

Summary and Discussion 
The extended caution indices, ECU, ECI2 and ECI4 are standardized 

by the usual transformation, 

ECIm - E(ECImlei) 

ECImg o 2, and 4. 

SE(ECImlei) 

The conditional expectation of ECl4i is a function of the 6 level, but 
those of the other two ECIs are Identically zero. If we sample two 
students from differejtlt 6± levels, then it is dangerous to compare their 

ECI4 values in ord^r to determine which student's response patterns is 

/ 

more aberrant than the other. Moreover, the standard errors of all 
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Table 3 

Percentage of Each Bug that was not Rescored and Remained 
in the November Modified Dataset (n =* 8, N = 89) 356 Sets of Responses 



Total 





Bugs 


% 


Scores 


* ECI4 

z 




1 


0 


. - 4 


3.728 




3 


0 


3 


4.309 




. 4 


0 


2 


4.259 


d 


8 


0 / 


6 


3.059 


o 










u 

o 


10 


0 


3 


4.045 




12 


p 


2 


^ -1.-247 




13 


0 


1 


1.338 



2.554 
1.435 
2.197 
.631 
-.887 
1.084 
1.162 
.876 



*Mean of Group 1 = 3.141 S.D. = .503 
Mean of Group 2 = 1.353 S.D. = .240 
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.006 
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.014 
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.003 
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.008 
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11 


.014 


1 




1^ 


.014 


6 




15 


.048 


7 
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three ECjIs are functions of 61 and have U shaped trend curves. Ttiis 
explains! the past findings that the correlation of personal indices, 
such as the caution index, NCI, or ICI, with total scores vary according 
to the shapes of the total-score distributions. The findings are that 
if the total-score distribution has a negative skevmess, then the 
correlation is positive, if the distribution is positively skewed, then 
a negative correlation results (Harnisch & Linn, 1981; Tatsuoka & 
Tatsuoka, 1980). Since the ECIs are natural extentions^of the caution 
index, we can safely impute some behaviors of ECIs to these discrete 
personal indices as well. ECIs provide inflated values at both the 
extremely high and low total scores. With the standardized ECIs, the 
bias of the values at the extreme scores is corrected, and moreover the 
responses from different levels of B can be compared safely. 

It would be ideal if the theoretical distribution of the 
standardized extended caution indices could be derived algebraically, 
but goodnes-of-f it tests of the ECIgS with normal distributions provide 
satisfactory evidence that they may follow approximately normal 
distributions. 

Regarding the detection rates of "bugs", they are unexpectedly low. 
We have tried to find the reasou for this by investigating each response 
pattern in the modified dataset. The results indicate that if an 
otherwise normal dataset includes a considerable number of aberrant 
response patterns, then these patterns are no longer detectable with 
high probability by the ECI approach. A new method to detect such 
aberrant response patterns should be investigated in the future. 



Rudner (1982) recently conducted a Monte Carlo study to compare the 
detection rates of various indices, lie found that the indices based on 
item response theory performed consistently better with his data than 
the indices based on sai^le statistics alone* But IRT is not always 
applicable in practice. An advantage of ECIs in comparison with other 
appropriateness indices or Wright's index is that they can start from 
the caution index when a sample is small. Then it can be shifted to 
ECIs as the sample size becomes larger without loss of continuity 
because ECIs are natural extentions of the S-P curve theory. However, 
further investigation of the relationships between the original caution 
index and the ECIs will be needed. 
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Captions of Appendices , 
. • \ 
Goodness of Fit Test for the Normal Distribution: The 
Stepfunction is the Cumraulative Distribution of ECU 

\ 

Goodness of Fit Test for the Normal ll)istribution: The 
Stepfunction is the Cummulative Distribution of Iz 

Goodness of Fit Test'^for the Beta Distribution: The 
Stepfunction is the Cuminiilative Distribution of ECI 

z 

Goodness of Fit Test for the Beta Distribution: The 
Stepfunction is the Cumraulative Distribution of ECI2 

z 

Goodness. of Fit Test for the Beta Distribution; The 
, Stepfunction is the Cumraulative Distribution of ECI4 

z 

Plot of Iz Against True Score for the Moditied Dataset 
i'W) and Erroneous Rules ("0"), and 80% Probability 
Interval (-.78, .81) 

Standard Error of ECU 

Standard Error of ECI2 

Standard Error of ECI4 

Plot of Expectation of ECI4 Against True Score 

Correlation Matrix of Standardized ECIs and Iz with Bugdata 
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APPENDIX IE : Goodness of Fit Test for the Beta Distribution : 

The Stepfunction is the Cummulotive Distribution of ECI z 
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a = 4.89 
b = 6.62 
N = 2400 



^ 1 1 1 1 1 1 1 1 

1 



APPENDIX nr : Goodness ^ Fit Test for the Beta Distribution : 

The Stepfunction is the Cummulotive Distribution of ECI2 z 
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APPENDIX TZI : Plot of Iz Against True Score for the Modified Datoset ("+") 

and Erroness Rules ("o"),and 80% Probability Interval (-.78, .81). 
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Appendix X 

Plot of Expectation of EGI4 Against True Score 
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Appendix XI 

Correlation Matrix of Standardized ECIs ana Iz 
With Bugdata 



/ 

/ 



/ 



5j 













Total 


True 




ECU 

z 


ECI2 

z 


ECI4 

,z 


Iz 


Score 


Score 




1 


2 


3 


4 


5 


6 


1 


1.00 ' 


.99 


.92 


-.88 


-.11- 


-%14 


2 




1.00 


.93 
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-.11 


-:i4 
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. 1.00 
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